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ABSTRACT 


Some  scientific  inference  tasks  (iixfuding  *..ass  spectrum  Hentification 
[Dcnciral],  medical  diagnosis  [Mycm).  and  math  theory  development 
[AM])  have  been  successfully  modelled  as  rule -directed  search 
processes.  These  role  systems  are  designed  quite  differently  from 
"pure  production  systems".  By  concentrating  upon  the  design  of  one 
program  (AM),  we  shall  show  how  13  kinds  of  design  deviations  arise 
from  h)  the  level  of  sophistication  of  the  task  that  the  system  is 
designed  to  perform,  (tt)  the  inherent  nature  of  the  task,  and  Uii)  the 
designer’s  view  of  tfic  task.  The  limitations  of  AM  suggest  even  more 
radical  cfepartures  from  traditional  rute  system  architecture.  All  these 
modifications  are  then  collected  into  a new,  complicated  set  of 
constraints  On  the  form  of  the  data  structures,  the  rules,  the 
interpreter,  and  the  distribution  of  kr>owledge  between  rules  and  data 
structures.  These  new  policies  sacrifice  uniformity  in  the  interests  of 
clarity,  efficiency  and  power  derivable  from  a thorough 

characterization  of  the  task.  Rule  systems  whose  architectures 
conform  to  the  new  design  principles  will  be  more  awkward  for  many 
tasks  than  would  "pure"  systems.  Nevertheless,  the  new  architecture 
should  be  significantly  more  powerful  and  natural  for  building  rule 
systems  that  do  scientific  discovery  tasks. 
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1.  The  Baftic  Argument 

Although  rule*besed  computation  was  jrigtnally  used  for  formal  and  systems  purposes 
[Post,Markov,Floyd],  researchers  in  Artificial  Intelligence  (Al)  found  that  the  same 
methodology  was  also  useful  for  modelling  a wide  variety  of  sophisticated  tasks.  Many 
of  those  early  AI  rule-hased  programs  --  called  "production  systems"  --  seVved  as 
information  processing  models  of  humans  performing  cognitive  tasks  in  several  domains 
(digit  recall  [19],  algebra  word  problem  solving  [1],  poker  playing  [23],  etc.  [16,18]). 

There  were  many  design  constraints  present  in  the  classical  formal  rule  based  systems. 
Many  of  these  details  were  preserved  in  the  Al  production  rule  based  programs  (e.g., 
forcing  all  slate  information  into  a single  string  of  tokens).  Out  there  were  many 
changes.  Tl>e  whole  notion  of  "what  a rule  system  really  is"  changed  from  an  effective 
problem  statement  to  a lerxIerKy  to  solve  problems  in  a particular  way.  One  tvpical 
coiollary  of  this  change  of  view  was  that  instead  o*  no  external  inputs  whatsoever, 
there  was  now  a prtsumption  of  some  "environme  nt"  jwhich  supplied  new  entries  into 
the  token  sequence.  In  the  r»eict  section  (see  Figure  1)  is  an  articulation  of  these  neo- 
classical (i.e,,  Al  circa  1973;  see  [7])  principles  for  designing  "pure"  production 
systems. 

Due  to  the  early  successes,  psychological  applicability,  and  aesthetic  simplicity 
•iffordcd  by  proouction  systems,  Al  researchers  began  to  write  rule  systems  (RSs)  to 
perform  informal  inductive  inference  tasks  (mass  spectrum  identification  [4],  medical 
di.ngnosis  [23]  and  consultation  dialogue  [6],  speech  understanding  [14],  non-resolution 
theorem  proving  [0],  math  research  [13],  and  many  more' 

Yet  it  seems  that  most  of  the  large,  successful  RSs  have  violated  many  of  the  "pure 
production  system"  guidelir>es.  The  purpose  of  this  paper  is  to  show  that  such 
■'exceptions"  were  inevitable,  because  any  system  satisfying  the  neo-classical  design 
constraints,  though  universal  in  principle,  is  too  impoverished  to  represent  complex 
tasks  for  what  they  are. 

The  essence  of  the  neo-classical  architecture  is  to  opt  for  simplicity  in  all  things,  since 
there  is  very  little  one  can  say  about  RSs  in  general.  As  more  becomes  known  about 
the  task  of  the  RS,  it  turns  out  that  some  of  that  new  knowledge  takes  the  form  of 
specific  constraints  on  the  design  of  the  RS  itself  (as  distinct  from  what  specific 
knowledge  we  choose  to  represent  within  that  design).  Sometimes  a new  constraint 
directly  contradicts  the  early,  domain-independent  one;  sometimes  it  is  merely  a 
softening  or  augmentation  of  the  otd  constraint. 

After  examining  the  "pure"  architecture,  we  shall  examine  in  detail  the  design  of  one 
particular  rule  system  which  discovers  and  studies  mathematical  concepts.  Deviations 
from  the  pure  architecture  will  be  both  frequent  and  extreme. 

Subsequent  sections  will  analyze  these  differences.  If  will  be  shown  that  each  one  is 
plausible  — usually  for  reasons  which  depend  strongly  on  the  "scientific  discovery" 
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domain  of  the  RS.  Some  of  the  limitations  of  this  RS  will  be  treated,  and  their 
elimination  will  be  seen  to  require  abandoning  still  more  of  the  original  design 
constraints. 

When  these  modifications  are  collected,  in  the  final  section,  we  shall  have  quite  a 
different  set  of  principles  for  building  RSs.  Not  only  will  naivete  have  been  lost:  so 
will  generality  (the  breadth  of  kirtds  of  Knowledge  representable,  the  totality  of 
tractable  tasks).  Rule  systems  conforming  to  the  new  design  will  be  awkward  for  many 
tasks  (just  as  a sledge  hammer  is  awkward  for  cracking  eggs).  However,  they  should 
be  significantly  more  powerful  ar>d  natural  for  scientific  inference  tasks. 


2.  Early  Design  G)nstraints 

By  a rufe  system  (RS)  we  shall  mean  any  collection  of  condition-action  rules,  together 
with  associated  data  structures  (OS:  also  called  memories)  which  the  rules  may  inspect 
and  alter.  There  must  also  be  a policy  for  interpretation:  detecting  and  firing  relevant 
rules. 

These  definitions  are  deliberately  left  vague.  Many  details  must  be  specified  for  any 
actual  rule  system  (e.g..  What  may  appear  in  the  condition  part  of  a rule?).  This 
specification  process  is  what  we  mean  by  designing  a RS. 

Tigure  1 contains  an  articulation  of  the  design  of  the  early  general-purpose  AI 
production  rule  systems.  Notice  the  common  theme:  the  adequacy  of  simplicity  in  all 
dimensions. 


FIQJRE  1;  Neo-classical  Rule  System  Architecture 

1.  Principle  of  Simple  Memories.  One  or  two  uniform  data  structures  dgfine 
sufficient  memories  for  a rule  system  to  read  from  and  write  into.  The 
format  for  entries  in  these  structures  is  both  uncomplicated  and  unchanging. 

2.  Principle  of  Simple  OS  Accesses.  The  primitive  read  and  write  operations  are 
as  simple  and  low-level  as  possible;  typically  they  are  simply  a membership- 
test  type  of  read,  and  an  insert-new-element  type  of  write.  More 
complicated,  algorithmic  operations  on  the  memories  are  not  available  to  the 
rules. 

3.  Principle  of  Isolated  DS  Elements.  Elements  of  the  uniform  DS  cannot  point 
to  (ports  of)  other  elements.  This  follows  from  the  preceding  principle:  if  we 
aren*t  allowed  to  chase  pointers,  there  may  as  well  not  be  any. 

4.  Prineinle  of  Continuous  Attention.  In  addition  to  the  one  or  two  simple  data 
structures,  there  may  be  an  external  environment  which  continuously  inserts 
stimuli  into  the  DS.  The  interleaving  of  stimuli  and  Internally  generated 
symbols  is  managed  quite  trivially:  (a)  The  stimuli  are  simply  inserted  into 
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the  DS  at  new  elementt:  (b)  Each  rule  is  in  small  and  quick  that  no 
interruption’'  mechanism  is  necessary.  The  interpreter  may  ignore  any 
suddenly-added  stimulus  until  the  current  rule  finishes  ezecuting.  The  RS 
may  be  viewed  as  "continuously'’  attending  to  the  environment. 

5.  Principle  of  Opaque  Rules.  Rules  need  not  have  a format  inspectable  by 
other  rules,  but  rather  can  be  coded  in  whatever  way  is  convenient  for  the 
programmer  and  the  rule  interpreter-,  i.e.,  the  set  of  rules  is  not  treated  as 
one  of  the  RSs  data  structures.  E.g.,  the  condition  parts  of  rules  may  be 
barred  from  fully  analyzing  the  set  of  productions  f?2f.  and  the  action  parts 
of  rules  may  not  be  allowed  to  delete  existing  rules  f74f. 

6.  Principle  of  Simple  Rules.  Rules  consist  of  « left  and  o right-hand  side 
which  are  quite  elementary:  The  left  hand  side  (Ihs,  situation 
characterization,  IF -part,  condition^  is  typically  a pattern-match  composed 
with  a primitive  DS  read  access,  and  the  right  hand  side  (rhs,  consequence, 
THEN-part,  actioni  is  also  simply  a primitive  DS  write  access.  There  is  no 
need  for  lophiitieated  bundles  of  DS  aeerssri  on  either  side  of  a rule.  Thus 
several  eztra  rules  should  be  preferred  to  a single  rule  with  several  actions, 

7.  Principle  of  Encoding  by  Coupled  Rules.  A colleetion  of  interrelated  rules  is 
used  to  accomplish  each  subtask;  i4t.,  wherever  o subroutine  would  be  used  in 
a procedural  progtamming  language.  For  example,  programming  an 
Iteration  may  require  many  rules  "coupled"  by  writing  and  reading  special 
(i.e.,  otherwise  meaningless^  loop  control  notes  in  the  data  structure. 

8.  Principle  of  Knowledge  as  Rules,  fill  knowledge  of  substance  should  be,  can 
■ be,  and  is  represented  as  rules.  This  includes  all  non-trivial  domain- 

dependent  information.  The  role  of  the  D5  is  just  to  hold  simple  descriptive 
information,  intermediate  control  state  messages,  recent  stimuli  from  the 
environment,  etc. 

9.  Principle  of  Simple  Interpretation.  The  topmost  control  flow  in  the  RS  is  via 
a simple  rule  interpreter.  After  a rule  fires,  tr  is  essential  that  any  rule  in 
the  system  may  potentially  be  the  next  one  to  fire  (i.e.,  it  is  forbidden  to 
locate  a fct  of  relevant  rules  and  fire  them  off  in  sequence).  When  the  rhs  of 
a rule  is  executed,  it  can  (and  frequently  wilt)  drastically  alter  the  situation 
that  determined  which  rules  were  relevant. 

10.  Principle  of  Closure.  The  representations  allowed  by  (1-9)  are  sufficient  and 
appropriate  for  organizing  aU  the  kinds  of  knowledge  needed  for  tasks  for 
which  a given  RS  is  designed. 


This  design  was  plausible  a priori,  and  worked  quite  well  for  its  initial  applications  (the 
simultihon  of  simple  human  cognitive  processes  [16,19,24]).  But  is  this  design  proper 
for  any  RS,  regardless  of  its  intended  task?  In  particular,  what  about  scientific 
inference  tasks?  Over  the  years,  several  rule-based  inference  systems  for  scientific 
tasks  have  been  constructed.  With  each  new  success  have  come  some  deviations  from 
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the  above  principles  [7],  Were  these  mere  aberrations,  or  is  there  some  valid  reason 
for  such  changes  in  design? 

We  claim  the  latter.  The  task  domain  — scientific  discovery  --  dictates  a new  and 
quite  different  architecture  for  RSs.  To  study  this  phenomenon,  we  shall  describe,  in 
the  ne*t  section,  one  particular  RS  which  defines  new  mathematical  concepts,  studies 
them,  and  conjectures  relationships  between  them.  Subsequent  sections  will  explore 
the  deviations  of  its  design  from  the  neo-classical  constraints  in  Figure  1. 


3.  “AM*:  A Rule  System  For  Math  Theory  Formation 

A recent  thesis  (13]  describes  a program,  called  "AM",  which  gradually  expands  a base 
of  mathematical  knowledge.  The  representation  of  malh  facts  is  somewhat  related  to 
Actors  [10]  and  Beings  [12]  in  the  partitioning  of  such  domain  knowledge  into 
effective,  structured  modules.  Departing  from  the  traditional  control  structures  usually 
associated  with  Actors,  Beings,  and  Frames  (15],  AM  concentrates  on  one  "interesting" 
mini-research  question  after  ar>other.  These  "jobs"  are  proposed  by  --  and  rated  by  -- 
a collection  of  approximately  250  situation-action  rules.  Discovery  in  mathematics  is 
modelled  in  AM  as  a rule-guided  exploration  process.  This  view  is  explained  below  in 
Section  3.1  (See  also  (21p  The  representation  of  knowledge  is  sketched  next,  followed 
by  a much  more  detailed  description  of  the  rule-based  control  structure  of  AM. 
Finally,  in  Section  3.5,  the  experimental  results  Of  the  project  are  summarized. 


3.1.  Discovery  in  Mathematics  as  Heuristic  Rule-Guided  Search 

The  task  which  AM  performs  is  the  discovery  of  new  mathematics  concepts  and 
relationships  between  them.  The  simple  paradigm  it  follows  for  this  task  is  to  maintain 
a graph  of  partially -developed  concepts  , and  to  obey  a large  collection  of  "heuristics" 
(rules  which  frequently  lead  to  discoveries)  which  guide  it  to  define  and  study  the 
most  plausible  thing  r>ext. 

For  example,  at  one  point  AM  had  some  notions  of  sets,  set-operations,  numbers,  and 
simple  arithmetic.  One  heuristic  rule  it  knew  said  “If  f is  an  interesting  relation.  Then 
look  at  its  inverse“.  This  rule  fired  after  AM  had  studied  "multiplication"  (or  a while. 
The  rhs  of  the  rule  then  directed  AM  to  define  and  study  the  relation  "divisors-of" 
(p.g.,  divisors-of(12)  » { 1,2,3,4,6,12 {).  Another  heuristic  rule  which  later  fired  said  "// 
f IS  a relation  from  A into  B,  then  it*s  worth  examining  those  members  of  A which  map 
into  extremal  members  of  B“.  In  this  case,  f was  matched  to  "divisors-of",  A was 
"numbers",  B was  “sets  of  numbers",  and  an  extremal  member  of  B might  be,  e.g.,  a 
very  small  set  of  numbers.  Thus  this  heuristic  rule  caused  AM  to  define  the  set  of 
numbers  with  no  divisors,  fhe  set  of  numbers  with  only  1 divisor,  with  only  2 divisors, 
etc.  One  of  these  sets  (the  last  one  mentiorwd)  turned  out  subsequently  to  be  quite 
important;  these  numbers  are  of  course  the  primes.  The  above  heurist'c  also  directed 
AM  to  study  numbers  with  very  many  divisors;  such  highly-composite  numbers  were 
also  found  to  be  interesting. 
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1 fiis  same  paradigm  enabled  AM  to  discover  concepts  which  were  much  more  primitive 
(eg.,  cardinality)  and  much  more  sophisticated  (e.g.,  the  fundamental  theorem  Of 
arithmetic)  than  prime  numbers.  We  shall  now  describe  the  AM  program  in  more  detail. 


3.2.  Representation  of  Mathematical  Knowtedce 

Wliat  cvactly  does  it  mean  for  AM  to  "have  the  notion  of"  a concept?  It  means  that  AM 
possesses  a frame-like  data  structure  for  that  concept.  For  instance,  here  is  how  one 
concept  looked  after  AM  had  defined  and  evpiored  it: 

FIGURE  2:  A T/pical  Concept 


NAME;  Prime  Numbers 
DEFINITIONS: 

ORIGIN:  Number -of -di'  isors  of(if)  2 

PREDICATE  CALCULUS;  Pnme(w)  t (V/K/N  r-1  XOR  z-v) 

ITERATIVE;  (for  v*!);  For  i from  2 to  »-l,  '(i|k) 

EXAMPLES:  2,  n.  S,  7,  1 1.  13,  17 
BOUNDARV;  2.  .3 
OXlf'JOARV  FAILURES:  0.  1 
FAILURES:  12 

GENERALIZATIONS;  Numbers,  Numbers  with  an  e en  number  of  divisors, 

Numbers  with  a prime  number  of  divisors 
SPECIALIZATIONS:  Odd  Primes,  Prime  Pairs,  Prime  Unir^uely-addables 
CON.tFCS;  Unique  factorization,  Goldbach's  coniecture,  Fwlrema  of  Divisors-of 
ANALOGIES: 

Maximally -divisible  numbers  are  converse  extremes  of  Divisors-of 
INTFREST:  Conjectures  tying  Primes  to  Times,  to  Divisors-of,  to  closely  related  ops 
W(3RTH;  800 


3.3.  Top-level  Control:  An  A|cnda  of  Promising  Questions 

AM  was  initially  given  a collection  of  115  core  concepts,  with  only  a few  facets  (i.e., 
slots)  filled  in  for  each.  AM  repeatedly  chooses  some  facet  of  some  concept,  and  tries 
to  (ill  in  some  entries  (or  that  particular  slot.  To  decide  which  such  job  to  work  on 
next,  AM  maintains  an  agenda  of  jobs,  a global  queue  ordered  by  priority  [2].  A 
typical  job  Is  “Fitt-in  tramples  of  Pnmes”.  The  agenda  may  contain  hur>dreds  of 
entries  such  as  this  one.  AM  repeatedly  selects  the  top  job  from  the  agenda  and  tries 
to  carry  it  out.  This  is  the  whole  control  structure!  Of  course,  we  must  still  explain 
how  AM  creates  plausible  r»ew  jobs  to  place  on  the  agenda,  how  AM  decides  which  job 
wilt  be  the  best  one  to  execute  next,  and  how  it  carries  out  a job. 

If  the  job  were  "Fill  in  new  Algorithms  for  Set-union",  then  satisfying  it  would  mean 
artually  synthesizing  some  new  prCKedures,  some  new  LISP  code  capable  of  forming 
the  union  of  any  two  sets.  A heuristic  rule  is  relevant  to  a job  if  and  only  if  executing 
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Ih^t  rule  brings  AM  closer  to  satisfying  that  job.  Potential  relevance  is  determined  a 
priori  by  where  the  rule  is  stored.  A rule  lacked  onto  the  Domain/range  facet  of  the 
Compose  concept  would  be  presumed  potentially  relevant  to  the  job  "FiU  in  the 
Domain  of  Instrt-o-Delete".  The  Ihs  of  each  potentially  relevant  rule  is  evaluated  to 
determine  whether  the  rule  is  truly  relevant. 

Once  a job  is  chosen  from  the  agenda,  AM  gathers  together  all  the  potentially  relevant 
heuristic  rules  --  the  ones  which  might  accomplish  that  job.  They  are  executed,  and 
then  AM  picks  a new  job.  White  a rule  is  executing,  three  kinds  of  actions  or  effects 
c an  occur: 

(t)  Facets  of  some  concepts  can  get  filled  in  (e.g.,  examples  of  primes  may  actually  be 
found  and  tacked  onto  the  "Examples"  facet  of  the  "Primes"  concept).  A typical 
heuristic  rule  which  might  nave  this  effect  is: 

//  examplet  of  X are  deured,  where  X it  a hind  of  V (for  tome  more  general 
concept  V), 

Then  check  the  eramplet  of  V;  tome  of  them  may  be  eramplet  of  X at  welL 

For  the  job  of  filling  in  examples  of  Primes,  this  rule  would  have  AM  notice  that 
Primes  •$  a kind  of  Number,  and  therefore  look  over  all  the  known  examples  of 
Number.  Some  of  those  would  be  primes,  and  would  be  transferred  to  the 
Examples  facet  of  Primes. 

(11}  New  concepts  may  be  created  (e.g,  the  concept  "primes  which  are  uniquely 
representable  as  the  sum  of  two  other  primes"  may  be  somehow  be  deemed 
worth  studying).  A typical  heuristic  rule  which  might  result  in  this  new  concept 

is: 


If  tome  (but  not  mott)  examplet  of  X are  alto  example t of  Y (for  tome 
concept  Y), 

Then  create  a new  concept  defined  at  the  intersection  of  those  7 concepts  (X 
and  V). 

Suppose  AM  has  already  isolated  the  concept  of  being  representable  as  the  sum 
of  two  primes  in  only  one  way  (AM  actually  calls  such  numbers  "Uniquely-prime- 
addable  numbers").  When  AM  notices  that  some  primes  are  in  this  set,  the  above 
rule  will  create  a brand  new  concept,  defined  as  the  set  of  numbers  which  are 
both  prime  and  uniquely  prime  addable. 

(lit)  New  jobs  may  be  added  to  the  agenda  (e.g.,  the  current  activity  may  suggest  that 
the  following  job  is  worth  considering:  "Oneralire  the  concept  of  prime 
numbers").  A typical  heuristic  rule  which  might  have  this  effect  is: 

If  very  few  examples  of  X are  found. 

Then  add  the  following  Job  to  the  agenda:  "Cenerali/e  the  concept  X"f 

The  concept  of  an  agenda  is  certainly  not  new:  schedulers  have  been  around  for  a long 
time.  But  one  important  feature  of  AM's  agenda  scheme  is  a new  idea:  attaching  — and 
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using  --  a !,st  of  quasi -symbolic  reasons  to  each  job  which  explain  why  the  job  is 
worth  considering,  why  it’s  plausible.  It  is  the  responsibility  of  the  heuristic  rules  to 
include  reasons  for  any  jobs  they  propose.  For  example,  let's  reconsider  the  heuristic 
rule  mentioned  \t\  (ui)  above.  It  really  looks  more  like  the  following: 

If  very  few  examptet  of  X are  found. 

Then  add  the  following  job  to  the  agenda’  “Generalize  the  concept  X",  for  the 
following  reason:  “X*s  are  quite  rare;  a slightly  less  restrictive 
concept  might  be  more  interesting". 

If  the  same  job  is  proposed  by  several  rules,  then  several  different  reasons  for  it  may 
br  present.  In  addition,  one  ephemeral  reason  also  exists:  "Focus  of  attention"  [9]. 
Any  jobs  which  are  related  to  the  One  last  executed  get  ‘Focus  of  attention"  as  a 
bonus  reason.  AM  uses  all  these  reasons  to  decide  how  to  rank  the  jobs  on  the 
agenda.  Each  reason  is  given  a rating  (by  the  heuristic  which  proposed  it),  and  the 
ratings  are  combined  into  an  overall  priority  rating  for  each  job  on  the  agenda.  The 
jobs  are  ordered  by  these  ratings,  so  it  is  trivial  to  select  the  job  with  the  highest 
rating.  Note  that  if  a job  already  on  the  agenda  is  re-proposed  for  a new  reason,  then 
its  priority  will  increase.  If  the  job  is  re-proposed  for  an  already>present  reason, 
however,  the  overall  rating  of  the  job  will  not  increase.  This  turned  out  to  be  an 
important  enough  phenomerK)n  that  it  was  presented  in  [13]  as  a necessary  design 
constraint. 

AM  uses  each  job's  list  of  reasons  in  other  ways.  Once  a job  has  been  selected,  the 
quality  of  the  reasons  is  used  to  cfecide  how  much  time  and  space  the  job  will  be 
permitted  to  absorb,  before  AM  quits  and  moves  on  to  a new  job.  Another  use  is  to 
explain  to  the  human  observer  precisely  why  the  chosen  top  job  is  a plausible  thing 
for  AM  to  concentrate  upefn.  « 


3.4.  Low-level  Control:  A Lattice  of  Heuristic  Rules 

The  hundreds  of  corKepts  AM  possesses  are  interrelated  in  many  ways.  One  main 
organi7ation  is  that  provided  by  their  Generalization  and  Specialization  facets.  The 
concepts  may  be  viewed  as  nodes  on  a large  lattice  whose  edges  are  labelled 
Gonl/Spec.  The  importance  of  this  organization  stems  from  various  heritability 
properties.  For  example.  Spec  is  transitive,  so  the  specializations  of  Numbers  irKlude 
not  only  Primes  but  all  its  specializations  as  well. 

# 

I nt  us  describe  a second,  very  important  heritability  property.  Each  of  the  250 
heuristic  rules  is  attached  to  the  most  general  (or  abstract)  concept  for  which  it  is 
deemed  appropriate.  The  relevance  of  heuristic  rules  is  assumed  to  be  inherited  by  all 
its  specializations.  For  example,  a heuristic  method  which  is  capable  of  inverting  any 
function  will  be  attached  to  the  concept  Tunction"  but  it  is  certainly  also  capable  of 
inverting  any  permutation.  If  there  are  no  known  methods  specific  to  the  latter  job, 
then  AM  wilt  follow  the  GenI  links  upward  from  Permutation  to  Bijection  to  Function..., 
seeking  methods  for  inversion.  Of  course  the  more  general  concepts*  methods  lend  to 
be  weaker  than  those  of  the  specific  concepts. 
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In  other  words,  the  Genl/Spec  graph  of  concepis  induces  a graph  structure  upon  the 
set  of  heuristic  rules.  This  permits  potentially  relevant  rules  to  be  located  efficiently. 
Here  is  one  more  example  of  how  this  heritability  works  in  practice:  Immediately  after 
the  job  Fill  in  examples  of  Set -equality**  is  chosen,  AM  asks  each  generalization  of 
Set -equality  for  help.  Thus  it  asks  for  ways  to  fill  in  examples  of  any  Predicate,  any 
Activity,  any  Concept,  acKt  finally  for  ways  to  fill  in  examples  of  Anything.  One  such 
heuristic  rule  known  to  the  Activity  concept  says:  "//  examples  of  the  domain  of  the 
activity  f are  already  knoum.  Then  actually  execute  f on  some  random  members  of  its 
domain."  Thus  when  AM  applies  this  heuristic  rule  to  fill  in  examples  of  Set -Equality, 
its  Domain  facet  is  inspected,  and  AM  notes  that  Set-equality  takes  a pair  of  sets  as  its 
arguments.  Then  AM  accesses  the  Examples  facet  of  the  concept  Set,  where  it  finds  a 
largo  list  of  sets.  The  Ihs  is  thus  satisfied,  so  the  rule  is  fired.  Obeying  the  heuristic 
rule,  AM  repeatedly  picks  a pair  of  the  known  sets  at  random,  and  sees  if  they  satisfy 
Set -equality  (by  actually  running  the  LISP  furKtion  stored  in  the  Algorithms  facet  of 
Set -equality).  While  this  will  typically  return  False,  it  will  occasionally  locate  — by 
random  chance  — a pair  of  equal  sets. 

Other  heuristics,  tacked  onto  other  generalizations  of  Set-equality,  provide  additional 
methods  for  executirtg  the  job  Till  in  examples  of  Set -equality."  A heuristic  stored  on 
the  concept  Any-concept  says  to  symbolicatly  instantiate  the  definition.  After  spending 
much  time  manipulating  the  recursive  defirMtion  of  Set -equality,  a few  trivial  examples 
(like  {}•{})  produced.  Notice  that  (as  expected)  the  more  general  the  concept  is, 
the  weaker  (more  time-consuming,  less  chance  for  success)  its  heuristics  tend  to  be. 
For  this  reason,  AM  consults  each  concept's  rules  m order  of  increasing  generalization. 


3.5.  Behavior  of  this  Rule  System 

Af.  the  oreceding  four  sections  indicate,  the  dynamic  behavior  of  AM  was  as  follows:  a 
|ob  IS  chosen  from  the  agenda,  potentially  relevant  rules  are  located  by  their  position 
in  the  Genl/Spec  lattice,  their  Ihs's  (left-hand  sides)  are  evaluated  to  find  those  which 
actually  trigger,  they  are  then  executed  (in  order  of  decreasing  specificity)  until  they 
are  all  executed  (or  until  some  local,  self-imposed  limit  on  time  or  space  is  exceeded), 
and  the  cycle  repeats.  AM  has  a modest  facility  that  prints  out  a description  of  these 
activities  as  they  occur.  Here  is  a tiny  excerpt  of  this  self-trace  monologue. 

*•  Job  65:  *•  Fill  in  Examples  of  the  concept  "Oivisors-of". 

3 Reasons:  (1)  No  known  examples  of  Oivisors-of  so  far. 

(2)  TIMES,  which  is  related  to  Oivisors-of,  is  now  very  interesting. 

(3)  Focus  of  attention:  AM  recently  defined  Divisors-of. 

26  examples  found,  in  9.2  seconds.  e.g.,  0ivisors-of(6)-{I  2 3 6). 
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*t  Job  66;  *•  Consider  numbers  having  small  sets  of  Divisors-of. 

2 Reasons:  (1)  Worthwhile  to  look  for  extreme  cases. 

(2)  Focus  of  attention:  AM  recently  worked  on  Divisors-of. 

Filling  in  examples  of  numbers  with  0 divisors. 

0 examples  found,  m 4 0 seconds. 

Conjecture:  no  numbers  have  precisely  0 divisors. 

Filling  in  examples  of  numbers  with  1 divisors 

1 examples  found,  in  4 0 seconds,  eg.,  Divisors-of(  1 ) - 1 1 }. 

Conjecture:  1 is  the  only  number  with  precisely  1 divisor. 

Filling  in  examples  of  numbers  with  2 divisors. 

24  examples  found,  in  4 0 seconds,  e g,,  Divisors-of(lO)  ' (1  13). 
rjo  obvious  conjecture.  May  merit  more  study. 

Creating  a new  concept:  "rjumbers-with-2-divisors”. 

f-  tiling  in  examples  of  numbers  with  3 divisors. 

1 1 examples  found,  m 4 0 seconds  eg.  Divisors-of(49)  « {1  7 49). 

All  numbers  with  3 divisors  are  also  Squares.  Definitely  merits  more  study. 
Creating  a new  concept;  "Numbers-with-3-divisors". 


♦ • .lob  67:  ••  Consider  the  square-roots  of  Numbers-with-3-divisors. 

2  Reasons:  (I)  rJumbers-with-3-divisors  are  unexpectedly  also  Perfect  Squares. 

(2)  Focus  of  attention;  AM  recently  worked  on  Nos-with-3-divisors. 

All  square-roots  of  numbcrs-with-3-divisors  seem  to  be  Numbcrs-with-2-divisors. 

e.g  , Divisors-of(Square-root(169))  ' Divisors-oK  13)  {1  13). 

Even  the  conv-erse  of  this  seems  empirically  to  be  true. 

i.e.,  the  square  of  each  No-wifh-2-divisors  seems  to  be  a No-with-3-divisors. 
The  chance  of  coincidence  is  below  acceptable  limits.  . 

EJoosting  the  interestingness  rating  of  each  of  the  concepts  involved. 


**  Job  68:  *»  Consider  the  squares  of  Numbers-with-3-divisors. 

3 Reasons:  (1)  Squares  of  Numbers-with-2-divisors  were  interesting. 

(2)  Square-roots  of  Numbers-with-3-divisors  were  interesting. 

(3)  Focus  of  attention:  AM  recentty  worked  on  Nos-with-3-divisOrs. 


Now  that  we've  seen  how  AM  works,  and  we’ve  been  exposed  to  a bit  ol  "local" 
results,  let’s  take  a moment  to  discuss  the  totality  ol  ttie  mathematics  which  AM  carried 
out.  AM  began  its  investigations  with  scanty  knowledge  of  a hundred  elementary 
concepts  of  finite  set  theory.  Most  of  the  obvious  set-theoretic  concepts  and 
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rr>l;»fionsl>ips  were  quickly  found  (eg.,  de  Morgan's  laws;  singletons),  but  no 
'.ophisticated  set  theory  was  ever  done  (eg.,  diagonalization).  Rather,  AM  discovered 
natural  numbers  and  went  off  ewploring  elementary  number  theory.  Arithmetic 
operations  were  soon  found  (as  analogs  fo  sef-fheorefic  opcrafiom),  tnd  AM  made 
Mirpri'ing  progress  in  divisibility  theory.  Prime  pairs,  Oiophantme  equations,  (ho 
unique  factorization  of  numbers  into  primes,  Gotdbach's  conjecture  — these  were  some 
of  the  nice  discoveries  by  AM  Many  concepts  which  we  know  to  be  crucial  were  never 
uncovered,  however:  remainder^,  gcd,  greater-than,  infinity,  proof,  etc. 

AH  the  discoveries  mentior>ed  were  made  m a run  lasting  one  cpu  hour  (Interlisp'*  100k, 
GiJMFX  PDP-10  KI>.  Two  hundred  jobs  in  toto  were  selected  from  the  agenda  and 
evpi  uted  On  the  average,  a job  was  granted  30  cpu  seconds,  but  actually  used  only 
18  ♦.fronds.  For  a typical  job,  about  3b  rules  were  located  as  potentially  relevant,  and 
.iboul  a dozen  actually  fired.  AM  began  with  116  concepts  and  ended  up  with  three 
times  that  many.  Of  the  synthesized  concepts,  half  were  technically  termed  "losers'* 
(hotti  by  the  author  and  by  AM),  and  half  the  remaining  ones  were  of  only  marginal 
intr'rest. 

Alfhruigli  AM  fared  well  according  fo  several  different  measures  of  performance  (see 
brcfion  7.1  in  [13]),  Of  greatest  significance  are  its  limitations.  This  suhsection  will 
merely  report  them,  and  the  nevt  section  will  analyze  whether  they  were  caused  by 
radical  departures  from  the  neo-ctassical  production-system  architecture,  or  from 
rl<‘parting  not  tar  enough  from  that  early  design. 

A'.  AM  ran  longer  and  longer,  the  concepts  it  defined  were  further  and  further  from 
the  primitives  it  began  with.  Thus  "prime-pairs"  were  defined  using  "primes"  and 
"addition",  the  former  of  which  was  cJefmed  from  "divisors-of",  which  in  torn  came  from 
"mulliplicfltion",  which  arose  f rom ' "addition",  which  was  defined  as  a restriction  of 
"imion",  which  (finally!)  was  a primitive  concept  (with  heuristics)  that  we  had  supplied 
to  AM  initially.  When  AM  subsequently  neecfed  help  with  prime  pairs,  it  was  forced  to 
rely  on  rules  ol  thumb  supplied  originally  about  unioning.  Although  the  herilability 
properly  of  heuristics  did  ensure  that  those  roles  were  still  valid,  the  trouble  was  that 
they  were  too  general,  too  weak  to  deal  effectively  with  the  specialized  notions  of 
primes  and  arithmetic.  For  instarKe,  or^e  general  rule  indicated  that  AuB  would  be 
interesting  if  it  possessed  properties  absent  both  from  A and  from  B.  This  translated 
into  the  prime-pair  case  as  "7/  p*q»r,  and  p.qj-  are  primes.  Then  r is  interesting  if  it 
has  properties  not  possessed  by  p or  by  q.“  The  search  for  categories  of  such 
interesting  primes  r was  Of  course  barren.  It  showed  a fundamental  lack  of 
iindersf  anding  about  numbers,  addition,  odd/even -ness,  and  primes. 

As  tire  derived  concepts  moved  further  away  from  finite  set  theory,  the  efficacy  of  the 
initial  heuristics  decreased.  AM  began  to  "thrash",  appearing  to  lose  most  of  its 
heuristic  guidance.  It  worked  on  concepts  like  "prime  triples",  which  is  not  a rational 
tiling  to  investigate.  The  key  deficiency  was  the  lack  of  adequate  mefa-rules[6]: 
heuristics  which  cause  the  creation  and  modification  of  new  heuristics. 


'TSi»  concapt,  and  many  of  fka  olhar  'omiaiiont*.  ooyfd  kava  baan  diacovarad  by  fha  amatinf  baunalic  rolaa  in 
AM  Tha  palha  wkicb  would  havo  raauMad  in  (hair  dafimlion  wara  aimply  navar  ralad  hi|h  anoufb  to  aaptora 
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A'.idp  from  the  preceding  major  ti~>ifalion,  most  of  the  other  problems  pertain  to 
mi'.oing  Knowledge.  Many  corKepts  one  might  consider  basic  to  discovery  in  math  are 
ahsent  from  AM;  analogies  were  under-utilized;  physical  intuition  was  absent;  the 
interface  to  the  user  was  far  from  icfeal;  etc. 


4.  Reexamining  the  Design 

I I'f  lis  now  consider  the  major  components  of  a RS's  design  and  how  AM  treated  them: 
tlio  DS,  the  rule',  the  distribution  of  Knowledge  between  OS  and  rules,  and  the  rule 
interpretation  policy.  For  each  component,  AM’s  architecture  failed  to  adhere  strictly 
to  the  pure  RS  gu'delmes  Were  tfiese  departures  worth  the  loss  of  simplicity?  Were 
the  deviation'  d le  to  the  tasK  domain  (scientific  discovery),  to  ttie  task  view 
(tiourisfically  guided  growth  of  structured  theories),  or  to  other  sources?  These  are 
the  Kinds  of  questions  we  shall  address  in  each  of  the  following  subsections. 


4.1.  Data  Structures 

We  recognize  that  a < mglc  uniform  OS  (e.g.,  an  infinite  STM  [19])  is  universal  in  the 
furing  sense  of  being  formally  adequate:  One  can  encode  any  representation  in  a 
linear,  homogeneous  DS.  The  completeness  of  such  a DS  design  not  withstanding,  we 
lielieve  that  encouraging  several  distirKt,  special-purpose  DSs  will  enhance  the 
performance  of  a discovery  system.  That  is,  we  are  willing  to  sacrifice  aesthetic  purity 
of  DSs  for  clarity,  efficiency,  and  power.  In  tins  section  we  will  explore  this  tradeoff. 

Ttic  data  structures  used  m AM  are  unlike  the  uniform  memories  suggested  by  the  first 
design  constraint  (see  Figure  1).  One  DS  --  the  agenda  --  holds  an  ordered  list  of 
pl.uisible  questions  for  the  system  to  cofKentrate  on,  a list  of  jobs  to  work  on. 
Another  DS  is  the  graph  of  concepts  AM  knows  about.  Each  concept  itself  consists  in 
much  structured  information  (see  Figure  2).  The  reasons  AM  has  for  each  job  have 
information  associated  with  them.  Stilt  other  information  is  present  as  values  of 
certain  functions  and  global  variables;  the  epu  clock,  the  total  number  of  concepts,  the 
last  thing  typed  out  to  the  user,  the  last  few  concepts  worked  on,  etc.  All  these  types 
of  information  are  accessed  by  the  Ihs’s  (left  hand  sides)  of  heuristic  rules,  and 
.affected  by  rhs’s  (some  "deliberately"  in  the  text  of  the  rule,  some  "incidentally" 
through  a chain  of  if-addcd  methods). 

Wliy  IS  there  this  multitude  of  diverse  DSs?  Each  type  of  knowledge  ('obs,  math 
Knowledge,  system  status)  needs  to  be  treated  quite  differently.  Since  the  primitive 
operations  will  vary  with  the  type  of  information,  so  should  the  DS.  For  jobs,  the 
primitive  Kinds  of  accesses  will  be:  picking  the  highest-rated  job,  deleting  the  lowest- 
r.alcd  one,  reordering  some  jobs,  merging  new  ones.  A natural  choice  to  make  these 
operations  efficient  is  to  keep  the  system’s  goals  in  a queue  ordered  by  their  rating  or 
partially-ordered  by  those  ratings  that  are  commensurable.  For  resource  information. 
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Ihf-  ii<iual  request  is  for  some  statiUic  of  some  class  of  primary  data.  To  maintain  a 
lal)le  of  such  summary  facts  (like  how  much  the  CPU  clock  has  run  so  far,  or  how  many 
concepts  there  are)  is  to  introduce  an  unnecessary  DS  and  incur  exorbitant  costs  to 
maintain  many  short-liv*d  entries  that  will,  most  probably,  never  be  used.  It  is  far 
more  reasonable  to  run  a summarizing  procedure  to  develop  just  that  ephemeral,  up- 
to  date  information  that  you  need  For  math  concepts,  we  have  a much  less  volatile 
situation.  We  view  them  as  an  ever-growing  body  of  highly-interrelated  facts. 
Knowledge  in  this  form  is  stable  and  rarely  deleted.  When  new  knowledge  is  added,  a 
great  many  "routine"  inferer^ces  must  be  drawn.  In  a uniform,  linear  memory,  each 
would  have  to  be  drawn  explicitly:  m a structured  one  (as  the  Genl/Spec  graph 
structure  provides)  they  may  be  accomplished  through  the  tacit  (analogical) 
characteristics  of  the  representation,  simply  by  deciding  where  to  place  the 
information. 

farh  kind  of  knowledge  dictates  a set  of  appropriate  kinds  of  primitive  operations  to 
be  performed  on  it,  which  in  turn  suggest  natural  data  structures  in  which  to  realize  it. 
The  generality  of  this  perspective  on  rule-based  systems  is  made  more  plausible  by 
examining  other  RSs  which  deal  with  many  types  Of  knowledge  (e.g.,  [5]).  If  this  is  SO, 
il  the  design  proceeds  from  “knowledge  to  be  represented"  to  "a  data  structure  to 
hold  it",  then  fixing  a priori  the  capabilities  of  the  DS  access  primitives  available  to 
rules  IS  suspect. 

therefore,  we  advocate  the  oppos.fe:  the  RS  designer  is  encouraged  to  name  every 
combination  of  "machine"  operations  that  together  comprise  a single  conceptual  access 
of  data  by  rules.  In  AM,  it  is  quite  reasonable  to  expect  that  a request  like  "find  all 
generalizations  of  a given  concept"  would  be  such  a primitive  (i  c.,  could  be  referrr?d  to 
by  name).  Even  though  it  might  cause  the  "machine"  (in  this  case,  LISR)  to  run  around 
the  Genl/Spec  graph,  a single  rule  can  treat  th'S  as  merely  an  "access"  operation,  The 
use  of  complex  tests  and  actions  is  not  new;  we  simply  claim  that  it  is  always 
preferable  to  package  knowledge  (for  which  a reasonably  fast  algorithm  is  ^^/ailable) 
ac  a single  action  (though  it  may  have  side-effects  in  the  space  of  concepts)  or  a 
single  lest  (so  long  as  its  sole  side-effect  — modulo  caches  --  is  to  signal).  Primitive 
tests  and  actions  should  be  maximally  algorithmic,  not  minimally  computational. 

The  neo-classical  view  of  designing  a production  rule  system  was  that  of  defining  a 
machine.  Our  present  view  is  that  RSs  do  not  compute  so  much  as  they  guide  attention. 
In  adopting  this  view  (thereby  separating  the  controller  from  the  effector),  we 
recognize  that  we  are  giving  up  an  attractive  feature  of  pure  rule  systems:  a 
homogeneous  basis  for  definition.  For  example,  the  rule  system  designer  must  now 
'.pell  out  in  detail  the  definitions  of  the  DS  accessing  functions;  but  the  designer  of  a 
neo-classical  RS  is  stmply  able  to  take  as  givrni  the  m.drhing  and  inserting  operations 
(as  specified  in  neo-classical  principle  «6,  Figure  1),  and  he  builds  each  more 
complicated  one  out  of  these  primitives^.  In  giving  up  the  old  view  of  the  RS  as  an 
abstract  computing  machine,  the  RS  designer  must  use  another  homogeneous  subsirain 


^ Tilliar  by  alrinfinf  out  • ••quvnea  of  pr,iml,voo  on  ono  ■■do  of  ■ rulo,  or  by  bondcrof linf  ■ lifhlly  couplod 
bundio  of  ruloo  (oe  firinf  ouch  • rvl*  would  ohruIoIo  Irovortmi  ono  hnli  of  IKo  kind  Ibal  abound  in  AM*o  OSi) 
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(p  f*  , I ISP)  in  terms  of  which  to  define  his  DSs  and  especially  the  procedures  that 
process  them.  In  exchange,  he  obtains  a clear  distinction  between  two  kinds  of 
knowledge  contained  in  the  neo-classical  rule:  plausible  proposals  for  what  to  do  next, 
and  how  to  accomplish  what  might  be  proposed. 

We  have  seen  that  admitting  complicated  and  varied  DS‘.  leads  to  stylized  sets  of  DS 
a-- cesses.  The  DSs  and  their  sets  of  read/wrile  primitives  must  in  turn  be  explicitly 
defined  (coded)  by  the  designer.  This  seems  like  a high  price  to  pay.  Is  there  any 
hfight  side  to  this?  Yes,  one  rather  interesting  possibility  is  opened  up.  Not  only  the 
PS  designer,  but  the  RS  itself  may  define  OSs  and  DS  access  functions.  In  AM,  this 
might  take  the  form  of  dynamically  defining  new  kinds  of  facets  (slots).  Eg.,  after 
"injective  Function"  is  defir^ed^  and  after  some  properties  of  it  have  been  discovered,  it 
would  be  appropriate  to  introduce  a new  facet  called  "inverse"  for  each  (concept 
representing  an)  injective  fuivfion  In  AM,  the  actual  definitions  of  the  facets  of  every 
foncept  are  complex  enough  (shared  structure),  inter  related  enough  (shared  meaning), 
and  interesting  enough  (consistent  heuristic  worth)  that  a special  concept  was  included 
for  each  one  (e.g.,  a concept  called  "Examples")  which  contained  a definition, 
dr-r notion,...  of  the  facet.  Thus  the  same  techniques  for  manipulating  and  discovering 
math  concepts  may  be  applied  to  DS  design  concepts,  ftof  only  do  math  theories 
emerge,  SO  can  new  OS  access  functions  (new  slots;  e g,,  "Small  EJoundary  Examples", 
"I  ar  tori7ation",  or  "Inverse"). 

II  should  be  noted  that  in  optir>g  for  non-uniform  DSs,  we  have  not  in  general 
'.arr diced  efficiency.  Oe  has  only  to  compare  the  time  to  access  a node  in  a tree, 
versus  in  a linear  list,  to  appreciate  that  efficiency  may,  in  fact,  be  tncreased  by  non- 
I iniformity. 

■ Ii'sl  how  tangle.d  up  a DS  should  we  tolerate’  Should  memory  elements  be  permitted 
to  refer  to  Mo  "know  about")  each  other’  We  believe  the  answer  to  depenit  upon  the 
type  of  data  structure  involved.  For  the  homogeneous  DS  called  for  in  the  neo-classical 
di'sign,  much  simplicity  is  preserved  by  forbidding  this  kind  of  interrelationship.  But 
fon'ider  a DS  like  AM's  graph  of  concepts.  It  is  growing,  analogically  interrelated,  and 
it  contains  descriptions  of  its  elements.  This  richness  (and  cheer  quantity)  of 
information  can  be  coded  only  inefficiently  in  a uniform,  non-self-referential  manner, 
r or  another  example,  consider  AM's  agenda  of  jobs.  One  reason  for  a job  may  simply 
hr  the  existence  of  some  other  job.  In  such  a case,  it  seems  natural  for  part  of  one 
entry  on  the  agenda  (a  reason  part  of  one  job)  to  point  to  another  entry  in  the  same 
DS  (point  to  another  specific  job  u ’ the  agenda).  Thus,  inter-element  pointers  are 
allowed,  even  though  they  blur  a "p.ire"  distinction  between  a DS  and  its  entries.^ 
Inter -element  references  play  a fHJcessary  role  in  organizing  targe  bodies  of  highly 
interrelated  information  into  structured  mexiules. 

There  is  yet  another  motivation  for  special-purpose  DSs  when  the  task  of  the  RS 
includes  sensing  an  external  environment.  Using  a uniform  memory,  external  stimuli 
are  dumped  into  the  working  memory  and  rub  shoulders  with  all  the  other  data.  Jhey 
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mu'-.!  then  be  distinguished  from  the  others.  (“Must”  because  to  freely  intermingle 
what  one  sees  or  is  told  with  what  one  thinks  or  remembers  is  to  give  way  to  endless 
confusion.)  How  much  cleaner,  less  distracting,  and  safer  it  is  for  stimuli  to  arive  in 
their  own  special  place  — a place  which  might  well  be  a special  purpose  store  such  as 
an  intensity  array  (not  even  a list  structure  at  all),  or  a low-level  speech-segment 
rjnciie.  A linear  memory  (e.g.,  an  infinite  STM)  is  of  course  adequate;  one  could  tag 
each  incoming  environmental  stimulus  with  a special  flag.  But  the  design  philosophy 
wr  arc  proposing  is  auaed  at  mainmizing  clarity  and  efficiency,  not  uniformity  or 
uni’yrrsality. 

We  know  that  this  view  of  DSs  means  making  a specialized  design  effort  for  each  class 
of  knowledge  incorporated  into  the  PS-  But  that  is  desirable,  as  it  buys  us  three 
things;  11,1  system  performance  is  irKreased,  fit)  some  forms  of  automatic  learning  are 
facilitated,  (ui)  krmwledge  is  easier  to  erKOde. 


4.2.  Rules 

In  the  "pure"  view  of  RSs,  the  rule  store  is  not  a full-fledged  D5  of  the  PS.  For 
evample,  m Waterman's  [24]  poker  player,  roles  may  not  be  deleted.  Rychener  (22] 
states  that  the  Only  way  his  PS  r«ay  inspect  rules  is  by  examining  the  effect  of  those 
rules  which  have  recently  fired.  Although  AM  had  no  enplicd  tahoo  against  inspecting 
lules,  such  analyses  were  «n  practice  never  possible,  since  the  rules  were  ad  hoc 
blocks  of  LISP  code.  This  eventually  turned  oof  to  he  the  mam  limitation  of  the  design 
nf  AM  The  ultimate  impediment  to  further  discovery  was  the  lark  of  rules  winch  could 
rca'on  about,  modify,  delete,  and  synthesize  other  rules.  AM  direly  needed  to 
‘./nthesizc  specialised  forms  of  the  given  general  heuristic  rules  (as  new  concepts 
.uose;  see  the  end  of  3.5.) 

We  want  Oiir  heuristic  rules  to  be  added,  kept  track  of,  reasoned  about,  modified, 
drletrd,  generalized,  specialized,  ...  whene\*er  thnee  is  a good  reason  to  do  so.  Note 
flial  those  situations  may  be  very  different  from  the  ones  m which  such  a rule  might 
fire.  F.g.,  upon  discovering  a r>ew.  interesting  concept,  AM  shoiilrf  try  to  create  some 
spec iaily-tailored  heuristic  rules  for  it.  They  wouldn’t  actually  fire  until  much  later, 
when  their  Ihs’s  were  triggered.  After  having  consfriKted  such  rules,  AM  might 
s iih)ect  them  to  criticism  and  improvement  as  if  explores  the  new  concept. 

In  sum,  we  have  fourni  that  the  cftscovery  of  heuristic  rules  for  using  new  math 
concepts  is  a necessary  part  of  the  growth  of  math  knowledge.  Hence,  following  the 
argument  in  4.1,  the  rules  themselves  should  be  DSs,  and  each  rule  might  be  described 
by  a concept  with  effective  (executable)  and  non-effective  (purely  descriptive)  facets. 
This  lesson  was  made  all  the  more  painful  because  it  was  not  new  [5].  Apparently  the 
nc'ed  for  reasoning  about  rules  is  common  to  many  tasks. 

The  current  re-coding  of  AM  ctoes  in  fact  have  each  rule  represented  as  a concept. 
What  kinds  of  non-effeclive  "facets"  do  they  have?  Pt  call  that  one  of  the  features  of 
the  original  AM  (as  described  In  Section  3.3)  was  that  with  each  rule  were  associated 
some  symbolic  reasotu  which  if  could  provicte  whenever  it  proposed  a new  job  for  the 
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^poncfa.  So  one  Kind  of  facet  which  every  rule  can  po'.'’.r>ss  is  "Reasons’*.  What  others 
are  there’  Some  of  them  describe  the  rule  (e.g.,  its  ave'.Tp.e  cost);  some  facets  provide 
a road  map  to  the  space  of  rules  (e.g.,  which  rule  schemata  are  mere  specialirations  of 
the  given  one);  some  facets  record  its  derivation  (e.g.,  the  rule  was  proposed  as  an 
analog  to  rule  X because  ...),  its  redundancy  (some  other  rules  need  not  be  tried  if  this 
one  is),  etc. 

There  are  some  far-reaching  consequences  of  the  ner  d to  reason  about  rules  just  as  if 
they  were  any  other  cor^epts  Known  to  AK<.  When  one  piece  of  Knowledge  relates  to 
'overal  rules,  then  one  general  concept,  a rule  schema,  'hould  ewist  to  hold  that 
common  Knowledge.  S'CKe  each  rule  is  a concept,  there  will  be  a natural  urge  to 
evploit  the  same  Genl/Spec  organization  that  proved  *..o  useful  before.  Heritability  still 
holds;  e g.,  any  reason  which  evpiains  role  R is  also  somehow  a partial  evplanation  of 
each  specialization  of  R. 

Rule  schemata  have  cause  to  ewist  simply  because  they  generalize  --  and  hold  much 
information  which  would  otherwise  have  to  be  duplicated  m --  several  specific  rules. 
They  may  tend  to  be  "big"  and  less  directly  producti  •<»  when  eyeruted,  yet  tlrey  are  of 
value  in  capturing  the  esserne  of  the  discovery  techniques/^  We  put  "big"  in  quotes 
liecause  sheer  length  (total  number  of  lh$  tests  allowed,  total  number  of  rhs  actions)  is 
not  directly  what  weVe  talKing  abCHit  here.  A general  rule  schema  will  capture  many 
regularities,  will  express  an  idea  common  to  several  more  specific  rules.  It  will  contain 
dual  forms  of  the  same  rule,  sophistKated  types  of  variable-binding  (for  the  duration 
of  the  rule  application),  and  searchirig  may  even  be  required  to  find  the  actions  of  such 
a general  rule.  We  may  even  wish  to  consider  every  rule  in  the  RS  as  a rule  schema  of 
some  level  of  generality,  and  much  processing  r->ay  go  on  to  find  the  particular 
m'.tance(s)  of  it  which  should  he  applied  m any  particular  'ituation. 

I et  us  consider  a rule  schema  called  the  "rule  of  enthusiasm"  It  subsumes  several 
rules  in  the  original  AM  system  (pp.  2A7-8  Of  (13J).  eg.  those  that  said: 

If  concept  G is  now  very  interesting,  and  G was  created  ns  a generalisation 
of  some  earlier  concept  C, 

Give  ertra  consideration  to  generalising  G,  and  tc  generalising  C in  other 
ways. 


and; 


If  concept  S proved  to  be  a dead-end,  and  S was  created  os  a specialisation 
of  some  earlier  concept  C, 

Give  less  consideration  to  specialising  5,  and  to  specialising  C in  other  ways 
in  the  future. 


^ In  AM.  nven  Hw  tpacific  rul»«  may  ba  in  IK*  *****  IKat  IK**  v*ry  pr*ci**  fcnewl*<lf*  nwy  involv*  inwcK 
l•«l•nt  lo  lntt*r  and,  one*  lrift*r*d,  *«y  concKid*  •onm  *libor*l*  r**ull* 
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Ttie  proposed  rule  schema  is; 

If  concept  X has  ttery  htoh/tow  interest  and  X can  be  derived  from  some 
concept  C by  means  m. 

Give  more  /less  consideration  to  finding  (and  elaborating)  concepts  derived 
from  C.  X (and  their  “neighbors'")  by  means  analogous  to  m. 

There  are  four  variables  to  be  matched  and  coordinated  m the  Ihs  of  this  rule:  a 
concept  the  direction  (high  or  low)  of  its  extreme  interest  rating,  a derivation 
procedure  m and  an  associated  source  cOfKcpt  C The  action  itself  is  to  search  for 
jobs  of  a certain  type  arvf  give  them  a corresponrjmg  (high  or  low)  rating  change. 
Three  types  of  matching  are  present:  ft > ranging  over  a set  of  alternatives  which  are 
known  at  tl>e  time  the  rule  is  written  (e.g.,  the  “high/low"  alternative);  (ii)  ranging  over 
a ‘•.rt  of  alternatives  which  can  be  accessed  easily  at  any  moment  the  rule  is  run,  like 
the  'pt  of  concepts  and  connections  between  them  now  in  existence  (eg.,  the  variables 
X and  C range  over  this  kind  of  selh  fai)  ranging  over  a set  of  alternatives  which  must 
bo  bcuristically  searched  for  as  part  of  the  rule  execution  (eg.,  "analogous"  and 
"neighbors"  only  make  ser>se  after  a r>ontnvial  amount  of  searching  has  been 
performed). 

Since  the  "rule  of  enthusiasm"  is  very  general,  it  will  only  be  tried  if  r»o  more  specific 
rules  (such  as  the  two  which  were  listed  just  above  it)  are  relevant  at  the  time. 
Ideally,  the  search  to  specify  the  action  should  create  a new,  specialized  form  of  the 
rule  of  enthusiasm  to  catch  this  situation  arnf  handle  it  quickly,  should  it  arise  again. 
Mote  that  versions  of  this  schema  that  mention  generalization  or  specialization  are  also 
schemata  (without  any  specification  search);  they  are  simply  less  general  schemata 
fhan  the  rule  of  enthusiasm  itself.  Wtier>ever  a new  subject  for  discovery  gets  defined, 
the  abstract,  hard-lo-exccute  rule  schemata  can  be  specialized  (compiled,  refined  , 
pIc.)  into  efficient  heuristics  for  that  subject. 

Another  use  Of  a rule  scliema  might  be  to  name  a collection  of  neo-classic al  rules  that 
are  coupled  by  together  fulfilling  a sir>gle  furKtion.  Consider  a collection  of  rules 
which  IS  tightly  coupled,  say  to  perform  an  iteration.  Much  knowledge  about  the 
iteration  loop  as  a whole  may  exist.  Where  is  such  descriptive  information  to  be  stored 
and  sought?  Either  it  must  be  dopiKated  for  each  of  the  coupled  rules,  or  there  must 
be  a rule-like  concept  which  "krwws  about"  the  iteration  as  one  coherent  unit.  We 
conclude  that  even  if  some  interlwirted  rules  ore  kept  separate,  an  extra  rule  (a 
schema)  should  exist  which  (at  least  impiKilly)  has  a rhs  which  combines  them  (by 
containing  knowledge  common  to  all  of  them).  Thus  rule  schemata  do  more  than  just 
unify  general  properties  of  rules:  there  must  also  be  schemata  of  the  kind  that  relate 
function  to  mechanism. 

Another  problem  crops  up  if  we  consider  what  happens  if  one  of  the  coupled  rules  is 
modified.  Often,  some  corresponcling  change  should  be  made  in  all  its  companions.  For 
example,  if  a term  is  generalized  (replacement  of  "prime"  by  "number"  everywhere) 
then  the  same  substitution  had  probably  belter  be  done  in  each  rule  with  which  this 
one  is  supposed  to  couple.  What  we  are  saying  is  that,  for  RSs  which  modify  their 
own  rules,  it  can  be  dangerous  to  split  up  a single  conceptual  process  into  a bunch  of 
rules  which  interact  in  more  or  less  fixed  ways  when  run,  without  continuing  to  reason 
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iihoiit  them  as  an  integrity,  likt  any  other  algorithm  composed  of  parts.  Here  again, 
wc  find  pressure  to  treat  RSs  as  algorithms,  not  vice-versa. 

Finally,  let  us  make  a few  irresistable  observations.  The  whole  notion  of  coupling  via 
meaningless  tokens  is  aesthetically  repugnant  and  quite  contrary  to  "pure"  production 
system  spirit.  By  "meaningless"  we  mean  entries  in  DS  that  provide  a narrow  hand- 
crafted channel  of  communication  between  two  specific  rules  that  therefore  "know 
about  each  other".^  At  the  least,  when  a coupled  rule  deposits  some  "intermediate- 
state"  message  in  a DS,  one  would  like  that  message  to  be  meaningful  to  many  rules  in 
the  system,  to  have  some  significance  itself.  We  can  see  that  entries  in  a DS  have  an 
expected  meaning  to  the  read  access  functions  that  examine  the  OS.^  If  fhis  purity  is 
maintained,  then  any  apparent  "coupling"  would  be  merely  superficial;  each  rule  could 
stand  alone  as  a whole  doitiam-dependent  heuristic.  Thus  no  harm  should  come  from 
changing  a single  rule,  and  more  rules  could  be  added  that  act  on  the  "intermediate 
message"  of  the  coupling.  Such  meaningful,  dynamic  couplings  should  be  encouraged. 
Only  the  meaningless,  tight  couplings  are  being  criticized  here. 


4.3.  Distribution  of  Knowledge  Between  Rules  and  OS 

A common  "pure"  idea  is  that  all  knowledge  of  substance  ought  to  be  represented  as 
rules.  Independent  of  such  rules,  the  DS  forms  no  meaningful  whole  initially,  nor  has  it 
any  final  interpretation.  The  "answer"  which  the  RS  computes  is  not  stored  in  the  DS; 
rather,  the  answer  consists  in  the  process  of  rule  firings.^  The  DS  is  "just"  an 
intermediate  vehicle  of  control  information. 

Contrary  to  this,  wc  say  that  rules  ought  to  have  a symbiotic  relationship  to  DSs.  The 
DSs  hold  meaningful  domain-dependent  information,  and  rules  process  knowledge 
represented  in  them.  For  RSs  designed  to  perform  scientific  research,  the  DSs  contain 
the  theory,  and  the  rules  contain  methods  of  theory  formation. 

But  much  domain-dependent  knowledge  is  conditional.  Eg.,  "If  n and  m are  relatively 
prime  and  divide  x,  then  so  must  nm".  Shouldn't  such  If/Then  information  be  encoded 
as  rules?  We  answer  an  emphatic  No.  Just  as  there  is  a distribution  of  "all  knowledge 
of  substance"  between  rules  and  DSs,  so  too  must  the  conditional  information  be 
partitioned  between  them.  We  shall  illustrate  two  particular  issues:  (i)  Much 
information  can  be  stored  implicitly  in  DSs;  (ii)  Some  conditional  knowledge  is 
inappropriate  to  store  as  rules. 


^By  coniratl,  • ‘'ffl«tnm|fur  OS  tniry  will  embody  • pioco  of  informilion  which  ii  ipocific  to  tho  RS'i  teak,  not 
lo  (ho  actval  rulea  (hamaalvaa 

^ Parhapa  (hio  "maanini"  could  avan  ba  axpraitad  formally  ae  an  mvarianl  which  lha  wrila  accaaa  funcliont  for 
(ha  DS  mua(  navar  viola(a 

^ T(ta  aoquanca  of  ac(iono  in  (ima  In  addi(ion,  parhapa,  llio  "anawar"  may  involve  a few  of  ((lair  aida-affaxis 
E t , (Reopond  'VCS') 
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When  designing  a DS,  if  is  possible  to  provide  rnechamsms  for  holding  a vast  amount  of 
information  implicitly.  In  AM,  e.g.,  the  Organization  of  concepts  into  a Genl/Spec 
hierarchy  (plus  the  assumed  heritabitity  properties;  see  3.4)  permits  a rule  to  ask  for 
"all  concepts  more  general  than  Primes"  as  if  that  were  a piece  of  data  explicitly 
stored  in  a DS.  In  fact,  only  direct  generalizations  are  stored  ("The  immediate 
generalization  of  Primes  is  Numbers"),  ar»d  a "rippling"  mechanism  automatically  runs  up 
the  GenI  links  to  assemble  a complete  answer.  Thus  the  number  of  specific  answers  the 
ns  can  provide  is  far  greater  than  the  number  of  individual  items  in  the  DS.  True, 
Ificse  DS  mechanisms  wi!l  use  up  extra  time  m processing  to  obtain  the  answer;  this  is 
rfficient  since  any  particular  request  is  very  untikely  to  be  made.  Just  as  each  rule 
knows  about  a general  situation,  of  which  it  will  only  see  a few  instances,  that  same 
quality  (of  wide  potentiat  applicability)  is  just  as  valuable  for  knowledge  in  DSs.  These 
.ire  situations  where,  like  Dijkstra's  multiplier  [8],  the  mechanism  must  provide  any  of 
the  consequences  of  its  knowlectge  qmckty  on  demand,  but  in  its  lifetime  will  only  be 
asked  a few  of  them. 

flow  that  we  have  seen  how  tacit  information  can  be  encoded  into  DSs,  let  us  see  some 
f ases  where  il  should  be  - i e , wliere  it  is  not  appropriate  to  encode  it  as  rules  of 
the  System.  Many  things  pet  called  irt.plic ation,  ancl  only  some  of  them  correspond  to 
rule  application.  For  instance,  there  is  logical  entailment  (e.g.,  if  AaB  then  A),  physical 
f ausalion  (e  g.,  if  it  rams,  then  the  ground  will  get  wet),  probable  associations  (e.g.,  if  it 
IS  wet  underfoot,  then  it  has  probably  been  raining.)  These  all  describe  the  way  the 
woild  IS,  not  the  way  the  perceivcr  of  the  world  behaves.  Contrast  them  with 
knowledge  of  the  form  "If  it  is  rair»mg,  then  open  the  umbrella".  We  claim  that  this  last 
kind  of  situation-action  relationship  should  be  encoded  as  rules  for  the  PS,  but  that  the 
otfier  types  of  implication  should  be  stored  declarative!/  within  the  DS.  Let’s  try  to 
lustily  this  distinction 

The  situation-action  rules  indicate  imperafivety  how  to  behave  in  the  world;  the  other 
types  of  implication  merely  indicate  expected  relationships  and  tendencies  within  the 
world.  The  rules  of  a RS  are  meant  to  indicate  potential  procedural  actions  which  arc 
obeyed  hy  the  system,  while  the  DSs  irxficate  the  way  the  world  (Ihe  PSs  environment) 
behaves  in  terms  of  some  moeJel  of  it.  The  essential  thing  to  consider  is  what  relations 
aie  to  be  caused  in  time;  these  are  the  things  we  should  cast  as  rules.  The  Ihs  of  a 
rule  measures  some  aspect  of  knowlecfge  presently  in  DSs,  while  the  rhs  of  the  rule 
defines  the  attention  of  the  system  (regarded  as  a processor  feeding  off  of  the  DS)  in 
the  immediate  future. 

Tins  IS  the  heart  of  why  role-sets  are  algorithms  They  arc  algorithms  for  guiding  the 
application  of  other  (DS  processing)  algorithms.  It  also  explains  why  other  kinds  of 
implications  are  unsuitable  to  be  rules.  Consider  causal  implication  ("Raining  -->  Wet"). 
While  the  Ihs  could  be  a rule’s  Ihs  (it  measures  an  aspect  of  any  situation),  the  rhs 
rimuld  not  be  a rule’s  rhs  (it  (Joes  not  indicate  an  appropriate  action  for  the  system  to 
take).® 
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Most  purist  prodiirtion  systems  have  (often  implicitly?)  a rule  of  the  form  "If  the  left 
side  of  an  implication  is  true  m the  database,  Then  assert  the  right  side".  This  is  only 
one  kind  of  rule,  of  course,  capable  of  dealing  with  implications.  For  evampic,  MYCIN 
and  LT  [17]  (implicitly)  follow  a very  different  rule:  “If  the  rhs  of  an  implication  will 
satisfy  my  goal,  Then  the  Ihs  of  the  implication  is  now  tlie  new  goal".  Other  rules  are 
possible;  many  rules  for  reasoning  may  feed  off  the  same  “table"  of  world  knowledge. 
The  point  is  that  the  implications  themselves  are  declarative  knowledge,  not  rules.  In 
summary,  then,  it  may  be  very  important  to  distinguish  rules  (attention  guides)  from 
mere  implications  (access  guides),  and  to  store  the  latter  within  the  DSs.  This  policy 
was  not  motivated  by  the  scientific  inference  task  for  our  RS.  We  believe  it  to  be  a 
worthwhile  guideline  in  the  design  of  any  RS. 


4.4.  Interpreter 

After  a rule  fires,  the  f>eo-classicat  interpretation  policy  («9  in  Figure  1)  demands  that 
nny  rule  in  the  system  can  potentially  he  the  ne»t  one  selected  to  fire.  This  is  true 
regardless  of  the  speed-up  techniques  used  m any  particular  implementation  (say,  by 
preprocessing  the  Ihs's  mfo  a discrimination  net  [2?]).  But  consider  RSs  for  scientific 
discovery  tasks.  Their  task  --  both  at  the  fop  level  and  frequently  at  lower  levels  — is 
quite  open-er»ded.  If  twenty  rules  trigger  as  relevant  to  such  an  open-ended  activity 
(o  p. . p.athering  empirical  data,  inducing  conjectures,  etc.)  then  there  is  much  motivation 
for  continuing  to  execute  just  these  twenty  rules  lor  a while.  They  form  an  ad  hoe 
plausible  search  algorithm  for  the  agenda  item  selected 

A RS  for  discovery  might  reasonably  be  given  a complex  interpreter  (rule-firing 
policy).  Ak4,  for  example,  experimented  with  a two-pass  interpreter:  first,  a best-first, 
.if.cnda-driven  resource  allocator  and  aflention  focusser  selects  the  job  it  finds  most 
interesting;  second,  it  locates  the  set  of  relevant  rules  (typically  about  30  to  40  rules) 
tor  the  job,  and  begins  executing  them  one  after  another  (in  best-first  order  of 
'prcificify)  until  the  resources  allocated  in  the  hrst  step  run  out  [?0].  The  overall 
rating  of  the  |Ob  which  these  rules  are  to  satisfy  determines  the  amount  of  epu  time 
and  list  cells  that  may  be  used  up  before  the  rules  are  interrupted  and  job  is 
abandoned. 

For  example,  say  the  job  were  “Find  examples  of  Primes"  It's  allotted  35  epu  seconds 
and  300  list  cells,  due  to  its  overall  priority  rating  just  before  it  was  plucked  from  the 
agenda.  Say,  24  rules  are  relevant.  The  first  one  quickly  finds  that  “2“  and  “3“  are 
primes.  Should  the  job  halt  right  then?  No,  not  if  the  real  reason  for  this  job  is  to 
gather  as  much  data  as  possible,  data  from  which  conjectures  will  be  suggested  and 
tested.  In  that  case,  many  of  the  other  23  rules  should  be  fired  as  well.  They  will 
produce  not  only  additional  examples,  but  perhaps  other  types  o1  examples. 
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I he*  jobs  or»  AMs  agenda  arp  really  just  nriini-rp'rarf  b questions  whicti  are  plausible  to 
sjiencl  time  investigating  Although  phrased  as  specific  requests,  each  one  is  really  a 
rc'scarch  proposal,  a topic  to  concentrate  upon.  We  found  it  necessary  to  deviate  (roni 
the  simplest  uniform  interpreter  for  clarity  (e.g.,  a human  can  follow  the  first-pass  (job 
selection)  taken  alone  and  can  follow  the  second-pass  (job  execution)  by  itself),  for 
efficiency  (knowing  that  all  24  rules  are  relevant,  there  is  no  need  to  find  them  35 
times),  and  for  power  (applying  qualitatively  different  kinds  of  rules  yields  various 
types  of  eramples).  We  claim  this  quality  of  open-endedness  will  recur  in  any  RS 
whose  task  is  free  concept  enploration.  This  includes  all  scientific  discovery  but  not  all 
scientific  inference. 


r>.  Speculations  for  a New  Discovery  System 

(he  spirit  of  this  paper  has  been  to  give  up  straigtitforward  simplicity  in  RSs  for 
clarity,  efficiency,  ancf  power.  Several  cwamples  fiave  been  cited,  but  we  speculate  that 
there*  are  further  tradeoffs  of  this  kind  which  aie  applicable  to  RSs  whose  purpose  is 
tn  make  new  discoveries 

Often,  there  are  several  possible  ways  the  designer  may  view  the  task  of  (and 
Mihtasks  of)  the  intended  RS.  We  wish  to  add  the  notion  of  "proof"  to  AM,  say.  Should 
we  represent  proof  as  a resolution  search,  as  a process  of  criticism  and  improvement 
(111  spiralling  toward  a solution,  as  a natural  deduction  cascade,  Although  any  one 
of  tliese  task-views  might  perform  respectably,  wc  ad  'Oc ate  the  incorporation  of  all  of 
ttiem,  despite  the  concommitant  costs  o*  added  proce-'tng  time,  space,  and  interfacing. 
In  fact,  we  wish  never  to  exclude  the  possibility  of  ttic  system  acquiring  another  task- 
view. 

We  look  for  the  development  of  furttier  discovery  tools  in  the  form  of  domain- 
independent  meta-hcunstics  that  synthesize  heuristic  rules,  and  in  the  form  of  abstract 
heuristic  schemata  that  specialize  into  edicient  rules  for  each  newly-discovered 
domain.  These  discovery  tools  are  alt  part  of  "getting  familiar"  with  shallowly 
understood  concepts,  such  as  synthesized  ones  lend  to  be  initially.  It  may  even  be 
that  symbolic  analogy  techniques  exist,  cutting  across  the  traditional  boundaries  of 
knowledge  domains. 

We  contemplate  a system  that  keeps  track  of  (ar>d  has  methods  with  which  it  attempts 
to  improve)  the  design  of  its  Own  OSs,  its  own  control  structure,  and  perhaps  even  its 
own  design  constraints.  Although  working  in  (a  collection  of)  specific  domains,  this 
would  be  a general  symbol  system  discoverer,  capable  of  picking  up  and  exploring 
formulations,  testing  them  and  improving  them. 


5.1.  A New  Set  of  Design  Constraints 

Delow  are  13  prirKiples  for  designing  a RS  whose  task  is  that  of  scientific  theory 
formation.  They  are  the  result  of  reconsidering  the  original  principles  (Figure  1)  in  the 
light  shed  by  work  on  AM.  Most  of  the  "pore"  principles  we  mentioned  In  Figure  I are 
'flanged,  and  a lew  new  one*  have  emerged. 
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FIGURE  3:  Scientific  Discovery  RS  Architecture 

1.  Principle  of  Several  Appropriate  Menioriet.  For  each  type  of  knowledge 
which  mint  be  dealt  with  in  its  own  way,  a separate  DS  should  be 
maintained.  The  precise  nature  of  each  DS  should  be  chosen  so  as  to 
facilitate  the  access  (read  ^write)  operations  which  will  be  most  commonly 
requested  of  it.  * 

2 Principle  of  Marimal  OS  Accesses.  The  set  of  primitive  DS  access  operations 
u.e.,  the  read  tests  which  a rule's  Ihs  may  perform,  and  the  write  actions 
which  a rhs  may  call  for)  are  chosen  to  include  the  lar2est  packages  (clusters, 
chunks,...  > of  activity  which  are  commonly  needed  and  which  can  be 
performed  efficiently  on  the  DS. 

3.  Principle  of  f acetted  DS  Flements.  For  ever-growing  data  structures,  there  is 
much  to  be  gained  and  little  lost  by  permitting  parts  of  one  DS  item  to  point 
to  other  DS  items.  In  particular,  schematic  techniques  of  representing  content 
by  structure  are  now  possible. 

A,  Principle  of  Pules  as  Data.  The  view  which  the  RS  designer  takes  of  the 
system's  task  may  require  that  some  rules  be  capable  of  reasoning  about  the 
rules  in  the  RS  (adding  new  ones,  deleting  old  ones,  keeping  track  of  rules' 
performance,  modifying  ensting  rules,...).  Some  of  the  methods  the  RS  uses  to 
deal  with  scientific  knowledge  may  be  applicable  to  dealing  with  rules  as 
welL  In  such  eases,  the  system's  rules  may  thus  be  naturally  represented  as 
new  entries  in  the  existing  DS  which  holds  the  scientific  theory. 

5.  Principle  of  Regularities  Among  Rules.  Each  rule  is  actually  a rule  schema. 
Sophisticated  processing  may  be  needed  both  to  determine  which  instance(s) 
are  relevant  and  to  find  the  precise  sequence  of  actions  to  be  executed.  Such 
schemata  are  often  quite  elaborate. 

6.  Principle  of  Avoiding  Meaninglessly-Coupled  Rules,  Passing  special-purpose 
loop  control  notes  back  and  forth  is  contrary  to  both  the  spirit  of  pure  RSs 
and  to  efficiency.  If  rules  are  to  behave  as  coupled,  the  least  we  demand  is 
that  the  notes  they  write  and  read  for  each  other  be  meaningful  entries  in  DS 
(any  other  rule  may  interpret  the  same  note,  and  other  rules  might  have 
written  one  identical  to  it). 

7.  Principle  of  Controlled  Environment.  F or  many  tasks,  it  is  detrimental  to 
permit  external  stimuli  (from  an  environment)  to  enter  any  DS  at  random. 
At  the  least,  the  RS  should  be  able  to  distinguish  these  alien  inputs  from 
internally-generated  DS  entries. 

8.  Principle  of  Tacit  Knowledge.  In  designing  the  DS,  much  knowledge  may  be 
stored  implicitly:  e.g.,  by  where  facts  are  placed  in  a hierarchical  network. 
The  DS  should  be  designed  so  os  to  maximite  this  kind  of  concentrated, 
analogical  information  storage.  Hence,  hard-working  access  functions  are 
needed  to  encode  and  decode  the  fall  meaning  of  DSs. 
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9.  Principle  of  Named  Algorithms.  When  basic,  "how  to"  knowledge  is  at/nilnble. 
It  should  be  packaged  as  aa  operation  and  used  as  a part  of  the  Ihs  or  rhs  of 
various  rules.  Embodying  this  chunk  of  knowledge  as  several  coupled  rules  is 
not  recommended,  for  we  wilt  want  to  manipulate  and  utili/e  this  knowledge 
as  a whole. 

10.  Principle  of  Rules  as  Attention  Guides.  Knowledge  should  be  encoded  as  rules 
when  It  IS  intended  to  serve  as  a guide  of  the  systcni's  attention;  to  direct  its 
behavior.  Other  kinds  of  information,  even  if  stated  in  conditional  form, 
should  be  relegated  to  DSs  >eithcr  erphcitly  as  entries,  or  implicitly  as  special 
access  functions). 

11.  Principle  of  Inertial  Interpreter.  In  tasks  like  scientific  research,  where 
relevant  rules  will  be  performing  inherently  open-ended  activities  Ic.g.,  data- 
gathering),  such  rules  should  be  allowed  to  continue  for  a u/hile  even  after 
they  have  nominally  earned  out  the  activity  Ic.g.,  gathered  one  piece  of 
data).  In  such  cases,  the  occasional  wasted  time  and  space  is  more  than 
compensated  for  by  the  frequent  acquisition  of  valuable  knowledge  that  was 
concentrated  in  the  later  rules.  E or  scientific  discovery,  no  single  rule 
(however  “appropriate")  should  be  taken  as  sufficient:  a single  rule  must 
necessarily  view  the  task  in  just  one  particular  way.  All  views  of  the  task 
have  something  to  contribute;  hence  variety  depends  on  a policy  of  always 
applying  several  rules. 

12.  Principle  of  Openness.  A discovery  rule  sy.stem  ran  be  enriched  by 
incorporating  into  its  design  •rveral  independent  views  of  the  knowledge  it 
handles.  Never  assume  everythirig  is  knovui  about  a class  of  knowledge.  All 
appropriate  formulations  of  a knov'ledge  class  have  something  to  contribute; 
hence  variety  depends  on  openness  to  new  fomiulntions. 

13.  Principle  of  Support  of  Discovd y by  Design.  By  representing  its  own  design 
erphcitly  fsay,  as  concent s K the  RS  could  study  and  improve  those  oonerpts, 
thereby  improving  itself.  This  includes  the  DS  design'^,  the  access  function 
algorithms,  how  to  couple  them,  the  function  of  various  rules,  the 
interpretation  pohey  of  the  RS,  etc.  This  suggests  that  the  study  of  designs 
of  computational  mechanisms  may  be  a worthy  area  for  a discovery  system 
to  pursue,  whether  its  own  design  is  available  To  it  or  not. 


Rule  systems  whose  designs  adhere  to  these  guidelines  will  he  large,  elaborate,  and 
non-classical.  We  have  mentioned  throughout  the  paper  several  new  complications 
which  the  principles  introduce.  Trying  to  produce  such  a RS  for  a task  for  which  a 
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pure,  neo-classical  production  rule  system  was  appropriate  will  probably  result  in 
(li'^^aster.  Nevertheless,  empirical  evidence  suggests  that  RSs  having  this  architecture 
tvf'  quite  naturel  *nd  rolalively  lr«cliible  to  conslrutl  '•  (or  open  ended  liKo 

scientific  discovery. 
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